Goto

Collaborating Authors

 Head & Neck Cancer


A Classification Benchmark for Artificial Intelligence Detection of Laryngeal Cancer from Patient Speech

arXiv.org Artificial Intelligence

Cases of laryngeal cancer are predicted to rise significantly in the coming years. Current diagnostic pathways cause many patients to be incorrectly referred to urgent suspected cancer pathways, putting undue stress on both patients and the medical system. Artificial intelligence offers a promising solution by enabling non-invasive detection of laryngeal cancer from patient speech, which could help prioritise referrals more effectively and reduce inappropriate referrals of non-cancer patients. To realise this potential, open science is crucial. A major barrier in this field is the lack of open-source datasets and reproducible benchmarks, forcing researchers to start from scratch. Our work addresses this challenge by introducing a benchmark suite comprising 36 models trained and evaluated on open-source datasets. These models are accessible in a public repository, providing a foundation for future research. They evaluate three different algorithms and three audio feature sets, offering a comprehensive benchmarking framework. We propose standardised metrics and evaluation methodologies to ensure consistent and comparable results across future studies. The presented models include both audio-only inputs and multimodal inputs that incorporate demographic and symptom data, enabling their application to datasets with diverse patient information. By providing these benchmarks, future researchers can evaluate their datasets, refine the models, and use them as a foundation for more advanced approaches. This work aims to provide a baseline for establishing reproducible benchmarks, enabling researchers to compare new methods against these standards and ultimately advancing the development of AI tools for detecting laryngeal cancer.


Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy

arXiv.org Artificial Intelligence

Esophageal cancer is among the most common types of cancer worldwide. It is traditionally treated using open esophagectomy, but in recent years, robot-assisted minimally invasive esophagectomy (RAMIE) has emerged as a promising alternative. However, robot-assisted surgery can be challenging for novice surgeons, as they often suffer from a loss of spatial orientation. Computer-aided anatomy recognition holds promise for improving surgical navigation, but research in this area remains limited. In this study, we developed a comprehensive dataset for semantic segmentation in RAMIE, featuring the largest collection of vital anatomical structures and surgical instruments to date. Handling this diverse set of classes presents challenges, including class imbalance and the recognition of complex structures such as nerves. This study aims to understand the challenges and limitations of current state-of-the-art algorithms on this novel dataset and problem. Therefore, we benchmarked eight real-time deep learning models using two pretraining datasets. We assessed both traditional and attention-based networks, hypothesizing that attention-based networks better capture global patterns and address challenges such as occlusion caused by blood or other tissues. The benchmark includes our RAMIE dataset and the publicly available CholecSeg8k dataset, enabling a thorough assessment of surgical segmentation tasks. Our findings indicate that pretraining on ADE20k, a dataset for semantic segmentation, is more effective than pretraining on ImageNet. Furthermore, attention-based models outperform traditional convolutional neural networks, with SegNeXt and Mask2Former achieving higher Dice scores, and Mask2Former additionally excelling in average symmetric surface distance.


Deep Learning for Longitudinal Gross Tumor Volume Segmentation in MRI-Guided Adaptive Radiotherapy for Head and Neck Cancer

arXiv.org Artificial Intelligence

Accurate segmentation of gross tumor volume (GTV) is essential for effective MRI-guided adaptive radiotherapy (MRgART) in head and neck cancer. However, manual segmentation of the GTV over the course of therapy is time-consuming and prone to interobserver variability. Deep learning (DL) has the potential to overcome these challenges by automatically delineating GTVs. In this study, our team, $\textit{UW LAIR}$, tackled the challenges of both pre-radiotherapy (pre-RT) (Task 1) and mid-radiotherapy (mid-RT) (Task 2) tumor volume segmentation. To this end, we developed a series of DL models for longitudinal GTV segmentation. The backbone of our models for both tasks was SegResNet with deep supervision. For Task 1, we trained the model using a combined dataset of pre-RT and mid-RT MRI data, which resulted in the improved aggregated Dice similarity coefficient (DSCagg) on an internal testing set compared to models trained solely on pre-RT MRI data. In Task 2, we introduced mask-aware attention modules, enabling pre-RT GTV masks to influence intermediate features learned from mid-RT data. This attention-based approach yielded slight improvements over the baseline method, which concatenated mid-RT MRI with pre-RT GTV masks as input. In the final testing phase, the ensemble of 10 pre-RT segmentation models achieved an average DSCagg of 0.794, with 0.745 for primary GTV (GTVp) and 0.844 for metastatic lymph nodes (GTVn) in Task 1. For Task 2, the ensemble of 10 mid-RT segmentation models attained an average DSCagg of 0.733, with 0.607 for GTVp and 0.859 for GTVn, leading us to $\textbf{achieve 1st place}$. In summary, we presented a collection of DL models that could facilitate GTV segmentation in MRgART, offering the potential to streamline radiation oncology workflows. Our code and model weights are available at https://github.com/xtie97/HNTS-MRG24-UWLAIR.


UMambaAdj: Advancing GTV Segmentation for Head and Neck Cancer in MRI-Guided RT with UMamba and nnU-Net ResEnc Planner

arXiv.org Artificial Intelligence

Magnetic Resonance Imaging (MRI) plays a crucial role in MRI-guided adaptive radiotherapy for head and neck cancer (HNC) due to its superior soft-tissue contrast. However, accurately segmenting the gross tumor volume (GTV), which includes both the primary tumor (GTVp) and lymph nodes (GTVn), remains challenging. Recently, two deep learning segmentation innovations have shown great promise: UMamba, which effectively captures long-range dependencies, and the nnU-Net Residual Encoder (ResEnc), which enhances feature extraction through multistage residual blocks. In this study, we integrate these strengths into a novel approach, termed 'UMambaAdj'. Our proposed method was evaluated on the HNTS-MRG 2024 challenge test set using pre-RT T2-weighted MRI images, achieving an aggregated Dice Similarity Coefficient (DSCagg) of 0.751 for GTVp and 0.842 for GTVn, with a mean DSCagg of 0.796. This approach demonstrates potential for more precise tumor delineation in MRI-guided adaptive radiotherapy, ultimately improving treatment outcomes for HNC patients. Team: DCPT-Stine's group.


Gradient Map-Assisted Head and Neck Tumor Segmentation: A Pre-RT to Mid-RT Approach in MRI-Guided Radiotherapy

arXiv.org Artificial Intelligence

Radiation therapy (RT) is a vital part of treatment for head and neck cancer, where accurate segmentation of gross tumor volume (GTV) is essential for effective treatment planning. This study investigates the use of pre-RT tumor regions and local gradient maps to enhance mid-RT tumor segmentation for head and neck cancer in MRI-guided adaptive radiotherapy. By leveraging pre-RT images and their segmentations as prior knowledge, we address the challenge of tumor localization in mid-RT segmentation. A gradient map of the tumor region from the pre-RT image is computed and applied to mid-RT images to improve tumor boundary delineation. Our approach demonstrated improved segmentation accuracy for both primary GTV (GTVp) and nodal GTV (GTVn), though performance was limited by data constraints. The final DSCagg scores from the challenge's test set evaluation were 0.534 for GTVp, 0.867 for GTVn, and a mean score of 0.70. This method shows potential for enhancing segmentation and treatment planning in adaptive radiotherapy. Team: DCPT-Stine's group.


Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment: Insights and Analysis

arXiv.org Artificial Intelligence

Some automatic systems have ASR-based model has been fine-tuned for automated speech shown robust performance and stability by learning from expert disorder quality assessment tasks, yielding impressive results decisions [6, 7]. and setting a new baseline for Head and Neck Cancer speech contexts. This demonstrates that the ASR dimension from In 2024, Nguyen et al. [8] introduced a system that Wav2Vec2 closely aligns with assessment dimensions. Despite leverages the Automatic Speech Recognition (ASR) based its effectiveness, this system remains a black box with Wav2Vec2 model [9], known for its strong capability in no clear interpretation of the connection between the model learning speech representations. This approach compared ASR dimension and clinical assessments. This paper presents self-supervised learning (SSL) and the ASR dimension for the first analysis of this baseline model for speech quality assessment, speech quality assessment. It is shown that the fine-tuning focusing on intelligibility and severity tasks.


MAR-DTN: Metal Artifact Reduction using Domain Transformation Network for Radiotherapy Planning

arXiv.org Artificial Intelligence

For the planning of radiotherapy treatments for head and neck cancers, Computed Tomography (CT) scans of the patients are typically employed. However, in patients with head and neck cancer, the quality of standard CT scans generated using kilo-Voltage (kVCT) tube potentials is severely degraded by streak artifacts occurring in the presence of metallic implants such as dental fillings. Some radiotherapy devices offer the possibility of acquiring Mega-Voltage CT (MVCT) for daily patient setup verification, due to the higher energy of X-rays used, MVCT scans are almost entirely free from artifacts making them more suitable for radiotherapy treatment planning. In this study, we leverage the advantages of kVCT scans with those of MVCT scans (artifact-free). We propose a deep learning-based approach capable of generating artifact-free MVCT images from acquired kVCT images. The outcome offers the benefits of artifact-free MVCT images with enhanced soft tissue contrast, harnessing valuable information obtained through kVCT technology for precise therapy calibration. Our proposed method employs UNet-inspired model, and is compared with adversarial learning and transformer networks. This first and unique approach achieves remarkable success, with PSNR of 30.02 dB across the entire patient volume and 27.47 dB in artifact-affected regions exclusively. It is worth noting that the PSNR calculation excludes the background, concentrating solely on the region of interest.


Large Language Model-Augmented Auto-Delineation of Treatment Target Volume in Radiation Therapy

arXiv.org Artificial Intelligence

Radiation therapy (RT) is one of the most effective treatments for cancer, and its success relies on the accurate delineation of targets. However, target delineation is a comprehensive medical decision that currently relies purely on manual processes by human experts. Manual delineation is time-consuming, laborious, and subject to interobserver variations. Although the advancements in artificial intelligence (AI) techniques have significantly enhanced the auto-contouring of normal tissues, accurate delineation of RT target volumes remains a challenge. In this study, we propose a visual language model-based RT target volume auto-delineation network termed Radformer. The Radformer utilizes a hierarichal vision transformer as the backbone and incorporates large language models to extract text-rich features from clinical data. We introduce a visual language attention module (VLAM) for integrating visual and linguistic features for language-aware visual encoding (LAVE). The Radformer has been evaluated on a dataset comprising 2985 patients with head-and-neck cancer who underwent RT. Metrics, including the Dice similarity coefficient (DSC), intersection over union (IOU), and 95th percentile Hausdorff distance (HD95), were used to evaluate the performance of the model quantitatively. Our results demonstrate that the Radformer has superior segmentation performance compared to other state-of-the-art models, validating its potential for adoption in RT practice.


Let it shine: Autofluorescence of Papanicolaou-stain improves AI-based cytological oral cancer detection

arXiv.org Artificial Intelligence

Oral cancer is a global health challenge. It is treatable if detected early, but it is often fatal in late stages. There is a shift from the invasive and time-consuming tissue sampling and histological examination, toward non-invasive brush biopsies and cytological examination. Reliable computer-assisted methods are essential for cost-effective and accurate cytological analysis, but the lack of detailed cell-level annotations impairs model effectiveness. This study aims to improve AI-based oral cancer detection using multimodal imaging and deep fusion. We combine brightfield and fluorescence whole slide microscopy imaging to analyze Papanicolaou-stained liquid-based cytology slides of brush biopsies collected from both healthy and cancer patients. Due to limited cytological annotations, we utilize a weakly supervised deep learning approach using only patient-level labels. We evaluate various multimodal fusion strategies, including early, late, and three recent intermediate fusion methods. Our results show: (i) fluorescence imaging of Papanicolaou-stained samples provides substantial diagnostic information; (ii) multimodal fusion enhances classification and cancer detection accuracy over single-modality methods. Intermediate fusion is the leading method among the studied approaches. Specifically, the Co-Attention Fusion Network (CAFNet) model excels with an F1 score of 83.34% and accuracy of 91.79%, surpassing human performance on the task. Additional tests highlight the need for precise image registration to optimize multimodal analysis benefits. This study advances cytopathology by combining deep learning and multimodal imaging to enhance early, non-invasive detection of oral cancer, improving diagnostic accuracy and streamlining clinical workflows. The developed pipeline is also applicable in other cytological settings. Our codes and dataset are available online for further research.


Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models

arXiv.org Artificial Intelligence

Some research has been focused on using these models to automatically assess Head and Neck Cancers (HNC) significantly impact patients' the speech severity level [13, 14, 15]. Other studies analysed ability to speak, affecting their quality of life. Commonly how well diseases can be predicted by these models. For instance, used metrics for assessing pathological speech are subjective, A. Favaro et al. [16] compared interpretable speech prompting the need for automated and unbiased evaluation features to embeddings produced by SSL models on predicting methods. This study proposes a self-supervised Wav2Vec2-the presence of Parkinson's disease. They showed that based model for phone classification with HNC patients, to enhance using embeddings provides better detection accuracies at the accuracy and improve the discrimination of phonetic features cost of losing the insight into speech and language deterioration for subsequent interpretability purpose. The impact of given by interpretable features. While being able to detect pre-training datasets, model size, and fine-tuning datasets and a disease and assess its severity is important, we believe it parameters are explored. Evaluation on diverse corpora reveals is as important to interpret the output of these models, in order the effectiveness of the Wav2Vec2 architecture, outperforming to enhance trust that clinicians can have in these systems.